NCHLT: isiXhosa POS tag set

Tag set

For purposes of annotators, this tag set is by and large taken over from Taljard et al. (2008) and various documents compiled by G. Faasz and U. Heid from the IMS, Stuttgart and D.J. Prinsloo and E. Taljard, University of Pretoria. The information below refers to the current state of the tagset, but further development will probably necessitate any number of changes.

The tagset is mainly based on the lexical and morphological criteria defined by Lombard (1985) and Louwrens (1991). As described above, the logical structure of the tagset is divided into two layers of linguistic description (annotation levels):

The first annotation level (level 1) includes all mandatory, or, according to EAGLES, obligatory information, namely up to three elements: an element hinting at the word class, a second one specifying functional or syntactic properties, and a third one giving morphological specifics, cf. e.g. PRO(noun)EMP(hatic)PERS(on).

The second level of annotation (level 2) includes recommended and optional information. This level is in most cases used for a detailed description of closed class items described in the tagger lexicon. Compare the following excerpt:

 

Figure 1: Annotation levels

Description

Tag 1st level (mandatory information)

Tag 2nd level (optional/ recommended information)

Pronouns:

 

 

emphatic personal

PROEMPPERS

1sg,2sg,1pl,2pl

Verbals:

V

tr

Morphemes:

 

 

deficient

MORPH

def

 

For disjunctive languages, next to all orthographic words, all linguistic words will also be tagged, resulting in two layers of POS annotation: one for all orthographic words and one for all linguistic words. For conjunctive languages, this extra layer of POS annotation is not needed.

The tagset currently distinguishes 20 categories applicable to isiXhosa and two different levels of annotation. However, only level 1 has been annotated. The first part of the tag gives a general indication of the nature of the unit in question. These are as follows:

 

Tag

Explanation

PUNC

Punctuation

ABBR

Abbreviation (incl. acronyms)

ADJ

Adjective (incl. enumerative)

ADV

Adverb

CDEM

Class-indicating demonstrative

CONJ

Conjunction

COP

Copulative (copulative subject concord, demonstrative copulative, copulative verb)

FOR

Foreign

IDEO

Ideophone

INT

Interjection

INTER

Question word

N

Noun

NPP

Place and brand name

NUM

Numerative

POSS

Possessive (possessive concord, possessive pronoun)

PROEMP

Emphatic pronoun

PROQUANT

Quantitative pronoun

REL

Relative

V

Verbal

VAUX

Auxiliary verb

 

 

 

 

Tags not applicable to isiXhosa

ASP

Aspectual marker

AUX

Auxiliary stem

CN

Class-indicating nominal prefix

CO

Class-indicating object concord

CS

Class-indicating subject concord

MNEG

Negative morpheme

PART

Particle

TENS

Tense marker

 


PUNCTUATION

Level 1: PUNC

Notes:

Examples:

;

PUNC

(

PUNC

!

PUNC

PUNC

 

ABBREVIATION

Level 1: ABBR

Notes:

Examples:

njl.

ABBR

NGO

ABBR

 

ADJECTIVE

Level 1: ADJ01-11, ADJ14-15, ADJ01a, ADJ02a, ADJLOC

Notes:

Examples:

elide

ADJ05

amancinci

ADJ06

komnye

ADJLOC

 

ADVERB

Level 1: ADV, ADVLOC

Notes:

Examples:

kakuhle

ADV

jikelele

ADV

ngezantsi

ADVLOC

 

 [CLASS-INDICATING] DEMONSTRATIVE

Level 1: CDEM01-11, CDEM14-15, CDEMLOC

Notes:

Examples:

lowo

CDEM01

eli

CDEM05

phaya

CDEMLOC

 

CONJUNCTION

Level 1: CONJ

Notes:

Examples:

kanti

CONJ

ngenxa

CONJ

kwaye

CONJ

 

COPULATIVE

Level 1: COP

Level 2: COP_neg, COP_nil

Notes:

(-be, - and –bilê). For the copulative verb stem –se  the tag COP_neg on level 2 is used, as is the case for the verb stem –be (<-ba) when it is used in the negative form.

Examples:

akukho

COP

ngubani

COP

 

FOREIGN

Level 1: FOR

Notes:

 

Examples:

act

FOR

guide

FOR

 

IDEOPHONE

Level 1: IDEO

Examples:

rhoqo

IDEO

ngqo

IDEO

 

INTERJECTION

Level 1: INT

Level 2: INT_neg, INT_nil

Notes:

Examples:

na

INT

hayi

INT

 

INTERROGATIVES

Level 1: INTER

Level 2: _man, _time, _loc, _N01a, _N02a

Notes:

Examples:

ingaba

INTER

ntoni

INTER

 

NOUN

Level 1: N01-11, N14-15, N01a, N02a, NLOC, N00

Level 2: _aug, _dim, _loc, _name, _nil

Notes:

Examples:

dayagram

N00

umntu

N01

urhulumente

N01a

abahlali

N02

umzekelo

N03

iziphumo

N08

ubomi

N14

kwicandelo

NLOC

 

PLACE AND BRAND NAME

Level 1: NPP

Level 2: NPP_place, NPP_brand

Notes:

Examples:

KwaZulu-Natal

NPP

Mars

NPP

 

NUMERATIVE

Level 1: NUM

Notes:

Examples:

2.2

NUM

2005

NUM

74(a)

NUM

 

POSSESSIVE

Level 1: POSS01-11, POSS14-15, POSSLOC, POSSPERS, POSSKA

Level 2: POSSPERS_1pl, POSSPERS_2pl

Notes:

Examples:

wakho

POSS01

labahlali

POSS05

kamasipala

POSSKA

 

EMPHATIC PRONOUN

Level 1: PROEMP01-11, PROEMP14-15, PROEMPLOC, PROEMPPERS

Level 2: PROEMPPERS_1sg, PROEMPPERS_1pl, PROEMPPERS_2sg, PROEMPPERS_2pl

Notes:

Examples:

yena

PROEMP01

kuzo

PROEMPLOC

obona

PROEMP14

 

QUANTITATIVE PRONOUN

Level 1: PROQUANT01-11, PROQUANT14-15, PROQUANTLOC

Notes:

Examples:

bonke

PROQUANT02

zodwa

PROQUANT10

konke

PROQUANT15

 

RELATIVE

Level 1: REL

Notes:

Examples:

abakhoyo

REL

engcono

REL

 

VERBAL

Level 1: V

Level 2: V_tr, V_itr, V_dtr

Notes:

Examples:

afana

V

ukwakha

V

kubalulekile

V

 

AUXILIARY VERB

Level 1: VAUX

Level 2: VAUX_tr, VAUX_itr, VAUX_dtr

Notes:

Examples:

zidla

VAUX

kumele

VAUX